Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add celery broker configuration example using Amazon SQS #24019

Closed

Conversation

tsugumi-sys
Copy link

@tsugumi-sys tsugumi-sys commented May 30, 2022

Added because there is no clear configuration example using Amazon SQS as Celery broker #22863.

Note: Dictionary cannot be used in airflow.cfg because configparser cannot parse it ?. So you cannot set celery_broker_transport_options like this.

[celery_broker_transport_options]
region = ap-northeast-1
predefined_queues = {
    "my_q": {
        "url": "your_sqs_host",
        "access_key_id": "***",
        "secret_access_key": "***",
    }
}

Close #22863


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@tsugumi-sys tsugumi-sys requested review from kaxil and potiuk as code owners May 30, 2022 09:47
@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler kind:documentation labels May 30, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented May 30, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@eladkal
Copy link
Contributor

eladkal commented May 30, 2022

We had reports that predefined_queues isn't working properly with SQS #11225 (though it's on older Airflow versions)
Can you confirm it's working as it should?

@tsugumi-sys
Copy link
Author

We had reports that predefined_queues isn't working properly with SQS #11225 (though it's on older Airflow versions) Can you confirm it's working as it should?

I also checked predefined_queues option and it didn't work in my local environment. This option is not used in my commit.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label May 31, 2022
@github-actions
Copy link

The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

# Celery's environment variables.
export AWS_ACCESS_KEY_ID=your_aws_access_key_id
export AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
export AWS_DEFAULT_REGION=your_aws_region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if will be great if you can also mention this setting #21507

Copy link
Author

@tsugumi-sys tsugumi-sys Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add environment variable AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL=true but new queues are not created.

@@ -26,6 +26,17 @@ to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
change your ``airflow.cfg`` to point the executor parameter to
``CeleryExecutor`` and provide the related Celery settings.

If you use Amazon SQS as a Celery broker, set environment variables like this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a note that this is experimental support/use at own risk. It would be great also to add something along "there are anecdotal evidences that Amazon SQS" might be used as the broker backend.
We do not run any tests for SQS as celery backend (as opposed to rabbitmq and redis - we have integration tests for both).

We do not want to give the users false impression that they can expect it to work and ask as question "I followed the documentation and it does not work" - unless we have CI/tests covering the scenario we are talking about.

Or maybe @ferruzzi @o-nikolas @john-jac - there could be some blog post or documentation from Amazon team to describe it. I believe MWAA uses SQS as broker and maybe one of you (or someone from the amazon developer relations team) coudl write a blog post on "How to use SQS as celery broker for Airflow? " that woudl be nice - and then any questions about it could come to Amazon- I am afraid until we make system tests/integration with Amazon, making a statement that "SQS works and here is the configuration" in Airlfow docs is a little too much "leap of faith" :)

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - this is also important because it is not enough to just have those configs setting. See here #22863 (comment) - there area also some other things you need to have in airflow to make use of SQS - like installing celery with sqs extra and some additional prerequisites.

If we "claim" that Airlfow support SQS but we have no tesing and it does not work by default using "official" image of Airlfow, it opens a "floodgate" for people who would like to get support in their case because "we claimed it works" without actually providing them instructions how to do it.

I am more convinced looking at the discussion in #22863 (comment) that if we are to mention SQS at all, it needs to be backed by someone who will take the ownership and supoort of people in case it does not work for them.

@potiuk
Copy link
Member

potiuk commented Jul 25, 2022

Hmm. I thought a bit about it. And following the dicsusion in #25121 - I think we should not add "offficial documenation" to features that our users tested, but we have no automated tests for and do not know ourselves what might be potential risks, problems etc. I think this falls in the same camp.

@BasPH @jedcunningham @feluelle - WDYT?

I think @tsugumi-sys that a better approach would be that you (for example) write a short blog post about using SQS as a broker, and describe your experiences it - and then we can even add it to "Airlfow" publication on Medium and even link it from the Ecosystem page of Airflow. https://airflow.apache.org/ecosystem/

I think adding it to the official docs in experimental phase when we do not have any tests (nor intention to add them) is IMHO something we should not do - even if we add "Experimental" - because "experimental" means that we are planning to "run" some experiment and turn it into a "full feature" (or drop) when we finish the experiment. This is what happens with:

  • MsSQL support as meta-datada DB
  • ARM64 support for Docker Image.

In both cases we already have tests in CI and we actively plan to act and gather observations/. For example I think we might be inclined to drop MsSQL if the Survey results will not show significant uptake in MsSQL usage after it's been officially "experimentally" supported in 2.3 https://airflow.apache.org/blog/airflow-survey-2022/ shows a very little usage so far.

Adding information that SQS "works" in the official docs is some kind of statement we make to our users and they might have certain expectations if they follow it.

I think I'd be only inclined to add "experimental" SQS support if we have some tests in our CI making sure that it works (BTW. We have such - simple but regularly running tests for RabbitMQ and Redis).

Others - WDYT?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clarify if we REALLY want to have such statement in our docs.

@ferruzzi
Copy link
Contributor

I think it's perfectly reasonable to not want it in the docs if we can't back up the claim with testing.

@jedcunningham
Copy link
Member

I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle 🤷

@potiuk
Copy link
Member

potiuk commented Jul 28, 2022

I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle shrug

Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ?

https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/

Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT>

@potiuk
Copy link
Member

potiuk commented Aug 2, 2022

FYI. @jedcunningham - those are the kind of problems we might expect #25486 . I feel rather uncomfortable not having any tests around it. I am not sure even how we would approach diagnosing and fixing similar issues if we do not have any tests like that. There might be many reasons for this kind of failures (celery/sqs incompatibilities, our way of integrating with celery etc.

@o-nikolas
Copy link
Contributor

We had reports that predefined_queues isn't working properly with SQS #11225 (though it's on older Airflow versions) Can you confirm it's working as it should?

I also checked predefined_queues option and it didn't work in my local environment. This option is not used in my commit.

Predefined queues, or any other broker transport options can be configured by a plugin to get around the string vs object discrepancy.

@o-nikolas
Copy link
Contributor

o-nikolas commented Aug 10, 2022

I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle shrug

Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ?

https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/

Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT>

I have a love/hate relationship with using SQS as a celery broker :) (maybe this is for a blog post...)

As a tl;dr: it works, but has a lot of gotchas, and I wouldn't necessarily recommend it generally to airflow users.

SQS is serverless, scales great, and is relatively fast. But it doesn't actually have the same level of support as RabbitMQ and Redis have as a Celery broker. Namely the Monitor and Remote Control APIs are not supported, so while it is stable, it has far less support. Also SQS has the visibility timeout architecturally built into it, which means you can't have tasks that run longer than that timeout without that causing issues (max value for that timeout is 12 hours).

I still support adding automated testing for SQS though, I think that's a worthwhile task.

@potiuk
Copy link
Member

potiuk commented Aug 21, 2022

That comment from @o-nikolas make me think we are NOT ready to make it as an official part of our documentation. But as a blog post was mentioned several times - if either of you would like to write it, I think we can publish it in https://medium.com/apache-airflow

@potiuk
Copy link
Member

potiuk commented Aug 21, 2022

I think we can close it now. We can always reopen if we think it is worthwile, but I'd bee rather sceptical on supporting SQS knowing the caveats.

@potiuk
Copy link
Member

potiuk commented Sep 20, 2022

FYI: I think that was a good idea to not "pretend" we support SQS when we have no tests for it. An example of interesting problem with it is here for example: #26521

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler kind:documentation okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No clear celery configuration example using sqs
8 participants